Skip to content

Conversation

@jangorecki
Copy link
Member

@jangorecki jangorecki commented Oct 17, 2025

Towards #7371

this code assumes you have 8+ threads

cc()
do = function(what) {
  n = 1e8
  rbindlist(lapply(c("1e8+1"=1e8+1, "1e7"=1e7, "10"=10), function(half) {
    rbindlist(lapply(c("nothing"=1L,"volatile"=2L,"volatile+shared"=3L,"atomic write"=4L,"atomic read write"=5L, "reduction"=6L, "cancellation"=7L), function(variant) {
      a2<-system.time(
        a1<-omp_flags(variant, n, half, 8)
      )
      if (what == "time") as.list(a2)
      else if (what == "iters") as.list(a1)
    }), idcol="variant")
  }), idcol="halt")
}
do("iters")
do("time")
      halt           variant       V1       V2       V3       V4       V5       V6       V7       V8
    <char>            <char>    <int>    <int>    <int>    <int>    <int>    <int>    <int>    <int>
 1:  1e8+1           nothing 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
 2:  1e8+1          volatile 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
 3:  1e8+1   volatile+shared 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
 4:  1e8+1      atomic write 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
 5:  1e8+1 atomic read write 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
 6:  1e8+1         reduction 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
 7:  1e8+1      cancellation 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
 8:    1e7           nothing 10000000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
 9:    1e7          volatile 10000000  9368188  2219459  2374732  2843583  2794889  2256903  2389781
10:    1e7   volatile+shared 10000000  9561233  2291372  2512503  3533221  3080863  2444200  2472036
11:    1e7      atomic write 10000000  9732664  2249224  2339365  2973864  2526774  2218488  2205693
12:    1e7 atomic read write 10000000 10005487  1710672  1566658  1930782  1774714  1555453  1497075
13:    1e7         reduction 10000000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
14:    1e7      cancellation 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
15:     10           nothing       10        0        0        0        0        0        0        0
16:     10          volatile       10        0        0        0        0        0        0        0
17:     10   volatile+shared       10        0        0        0        0        0        0        0
18:     10      atomic write       10        0        0        0        0        0        0        0
19:     10 atomic read write       10        0        0        0        0        0        0        0
20:     10         reduction       10 12500000 12500000 12500000 12500000 12500000 12500000 12500000
21:     10      cancellation 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
      halt           variant       V1       V2       V3       V4       V5       V6       V7       V8
    <char>            <char>    <int>    <int>    <int>    <int>    <int>    <int>    <int>    <int>
      halt           variant user.self sys.self elapsed user.child sys.child
    <char>            <char>     <num>    <num>   <num>      <num>     <num>
 1:  1e8+1           nothing     3.596    0.000   0.472      0.002     0.002
 2:  1e8+1          volatile     3.588    0.001   0.481      0.000     0.005
 3:  1e8+1   volatile+shared     3.529    0.001   0.477      0.001     0.003
 4:  1e8+1      atomic write     3.689    0.001   0.494      0.002     0.003
 5:  1e8+1 atomic read write     3.668    0.000   0.493      0.001     0.004
 6:  1e8+1         reduction     3.493    0.000   0.467      0.000     0.004
 7:  1e8+1      cancellation     4.284    0.001   0.572      0.000     0.005
 8:    1e7           nothing     3.460    0.000   0.461      0.001     0.003
 9:    1e7          volatile     2.852    0.000   0.363      0.001     0.004
10:    1e7   volatile+shared     2.654    0.000   0.335      0.002     0.003
11:    1e7      atomic write     2.608    0.000   0.330      0.002     0.002
12:    1e7 atomic read write     2.534    0.000   0.320      0.002     0.003
13:    1e7         reduction     3.279    0.001   0.446      0.000     0.005
14:    1e7      cancellation     4.361    0.001   0.581      0.000     0.004
15:     10           nothing     0.001    0.000   0.004      0.000     0.006
16:     10          volatile     0.045    0.000   0.010      0.002     0.002
17:     10   volatile+shared     0.038    0.000   0.009      0.000     0.005
18:     10      atomic write     0.000    0.000   0.004      0.000     0.005
19:     10 atomic read write     0.044    0.000   0.010      0.002     0.003
20:     10         reduction     2.520    0.000   0.382      0.000     0.004
21:     10      cancellation     4.296    0.002   0.576      0.000     0.005
      halt           variant user.self sys.self elapsed user.child sys.child
    <char>            <char>     <num>    <num>   <num>      <num>     <num>

btw. I read volatile should not be used in favor of atomic

@jangorecki
Copy link
Member Author

if atomic write alone would do then seems best fit

@codecov
Copy link

codecov bot commented Oct 17, 2025

Codecov Report

❌ Patch coverage is 0% with 36 lines in your changes missing coverage. Please review.
✅ Project coverage is 98.90%. Comparing base (55b0de6) to head (e023ba3).
⚠️ Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
src/omp-flags.c 0.00% 29 Missing ⚠️
R/utils.R 0.00% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7372      +/-   ##
==========================================
- Coverage   99.11%   98.90%   -0.22%     
==========================================
  Files          85       86       +1     
  Lines       16443    16479      +36     
==========================================
  Hits        16298    16298              
- Misses        145      181      +36     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Co-authored-by: Benjamin Schwendinger <[email protected]>
@jangorecki
Copy link
Member Author

jangorecki commented Oct 17, 2025

@ben-schwen it sems that new options are not much of use, maybe it is a compiler issue? I am on recent gcc, omp 201511.
And I observed warning:

omp-flags.c: In function ‘benchmark_omp_flag’:
omp-flags.c:95:17: warning: ‘cancel for’ inside ‘nowait’ for construct [-Wopenmp]
   95 |         #pragma omp cancel for
      |                 ^~~

@ben-schwen
Copy link
Member

ben-schwen commented Oct 18, 2025

Interesting. With 20 threads I get this (which was my main motivation to include the reduction). I have gcc 11.4.0 and openmp 201511

      halt           variant       V1       V2       V3       V4       V5       V6       V7       V8
    <char>            <char>    <int>    <int>    <int>    <int>    <int>    <int>    <int>    <int>
 1:  1e8+1           nothing 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
 2:  1e8+1          volatile 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
 3:  1e8+1   volatile+shared 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
 4:  1e8+1      atomic write 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
 5:  1e8+1 atomic read write 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
 6:  1e8+1         reduction 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
 7:  1e8+1      cancellation 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
 8:    1e7           nothing 10000000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
 9:    1e7          volatile 10000000  7523756  6139470 10163068  5543296  9819239 10011390 10223716
10:    1e7   volatile+shared 10000000  9721537 10062042  2737256 10091377  2760686  9725714  9918532
11:    1e7      atomic write 10000000 10143687  9958940 10128160  5549333  5473646  9742693 10022550
12:    1e7 atomic read write 10000000  9916360  9925820  2955716 10075786 10073474  9892110  2933117
13:    1e7         reduction 10000000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
14:    1e7      cancellation 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
15:     10           nothing       10        0        0 12500000        0 12500000        0        0
16:     10          volatile       10        0        0        0    83056        0        0    74028
17:     10   volatile+shared       10    46021    19186    55439        0    28853    75632        0
18:     10      atomic write       10    90927    14187        0    95527    36864        0   142229
19:     10 atomic read write       10        0    69753   129379        0        0    90041        0
20:     10         reduction       10 12500000 12500000 12500000 12500000 12500000 12500000 12500000
21:     10      cancellation 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
      halt           variant       V1       V2       V3       V4       V5       V6       V7       V8
    <char>            <char>    <int>    <int>    <int>    <int>    <int>    <int>    <int>    <int>
      halt           variant user.self sys.self elapsed user.child sys.child
    <char>            <char>     <num>    <num>   <num>      <num>     <num>
 1:  1e8+1           nothing     0.125    0.000   0.017      0.002     0.001
 2:  1e8+1          volatile     0.055    0.000   0.012      0.002     0.000
 3:  1e8+1   volatile+shared     0.080    0.000   0.014      0.003     0.000
 4:  1e8+1      atomic write     0.047    0.000   0.010      0.002     0.000
 5:  1e8+1 atomic read write     0.080    0.000   0.014      0.002     0.000
 6:  1e8+1         reduction     0.033    0.000   0.009      0.002     0.001
 7:  1e8+1      cancellation     0.964    0.000   0.126      0.002     0.000
 8:    1e7           nothing     0.070    0.000   0.011      0.003     0.000
 9:    1e7          volatile     0.053    0.000   0.012      0.002     0.000
10:    1e7   volatile+shared     0.057    0.000   0.011      0.002     0.000
11:    1e7      atomic write     0.015    0.004   0.007      0.003     0.000
12:    1e7 atomic read write     0.098    0.000   0.014      0.001     0.001
13:    1e7         reduction     0.062    0.000   0.011      0.002     0.000
14:    1e7      cancellation     1.012    0.000   0.123      0.002     0.001
15:     10           nothing     0.031    0.000   0.008      0.002     0.001
16:     10          volatile     0.069    0.000   0.010      0.002     0.000
17:     10   volatile+shared     0.054    0.000   0.010      0.002     0.000
18:     10      atomic write     0.001    0.000   0.003      0.001     0.002
19:     10 atomic read write     0.085    0.000   0.012      0.003     0.000
20:     10         reduction     0.031    0.000   0.008      0.001     0.002
21:     10      cancellation     1.061    0.000   0.152      0.002     0.000
      halt           variant user.self sys.self elapsed user.child sys.child
    <char>            <char>     <num>    <num>   <num>      <num>     <num>

@jangorecki
Copy link
Member Author

@ben-schwen can you share also iterations made by each thread?
It is quite surprising that your code runs so much faster...

@ben-schwen
Copy link
Member

@jangorecki I have added the iterations above. I have an 13th Gen Intel(R) Core(TM) i7-1370P CPU which can apparently clock up to 5.2 GHz which might be the reason why my times are flying

@jangorecki
Copy link
Member Author

I think we can close this PR, changes according to this benchmark has been made in #7376 and #7361

@jangorecki jangorecki closed this Oct 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants